48 research outputs found
Toward automatic censorship detection in microblogs
Social media is an area where users often experience censorship through a
variety of means such as the restriction of search terms or active and
retroactive deletion of messages. In this paper we examine the feasibility of
automatically detecting censorship of microblogs. We use a network growing
model to simulate discussion over a microblog follow network and compare two
censorship strategies to simulate varying levels of message deletion. Using
topological features extracted from the resulting graphs, a classifier is
trained to detect whether or not a given communication graph has been censored.
The results show that censorship detection is feasible under empirically
measured levels of message deletion. The proposed framework can enable
automated censorship measurement and tracking, which, when combined with
aggregated citizen reports of censorship, can allow users to make informed
decisions about online communication habits.Comment: 13 pages. Updated with example cascades figure and typo fixes. To
appear at the International Workshop on Data Mining in Social Networks
(PAKDD-SocNet) 201
Spoken affect classification : algorithms and experimental implementation : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand
Machine-based emotional intelligence is a requirement for natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have gone unexplored in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods
Phantom cascades: The effect of hidden nodes on information diffusion
Research on information diffusion generally assumes complete knowledge of the
underlying network. However, in the presence of factors such as increasing
privacy awareness, restrictions on application programming interfaces (APIs)
and sampling strategies, this assumption rarely holds in the real world which
in turn leads to an underestimation of the size of information cascades. In
this work we study the effect of hidden network structure on information
diffusion processes. We characterise information cascades through activation
paths traversing visible and hidden parts of the network. We quantify diffusion
estimation error while varying the amount of hidden structure in five empirical
and synthetic network datasets and demonstrate the effect of topological
properties on this error. Finally, we suggest practical recommendations for
practitioners and propose a model to predict the cascade size with minimal
information regarding the underlying network.Comment: Preprint submitted to Elsevier Computer Communication
Topic modelling of clickthrough data in image search
In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used to infer a lower dimensional latent space that can be subsequently employed to improve various aspects of the retrieval system. We use a subset of a clickthrough corpus from the image search portal of a news agency to evaluate several popular latent variable models in terms of their ability to model topics underlying queries. We demonstrate that latent variable modelling reveals underlying structure in clickthrough data and our results show that computing document similarities in the latent space improves retrieval effectiveness compared to computing similarities in the original query space. These results are compared with baselines using visual and textual features. We show performance substantially better than the visual baseline, which indicates that content-based image retrieval systems that do not exploit query logs could improve recall and precision by taking this historical data into accoun
Latent variable modelling of user interaction in image retrieval
Cette thèse étudie les modèles à variables latentes sur les interactions utilisateur avec l'objectif d'améliorer la recherche d'images. Les historiques de recherche, appelés query logs, où l'interaction entre les utilisateurs et le système de recherche est enregistrée, contiennent souvent les indications d'intention sous la forme de jugements de pertinence donnés sur les documents dans le contexte d'une recherche. Selon la nature du système de recherche et de l'interaction qu'il permet, ces jugements peuvent être explicites ou implicites, et, une fois agrégé un grand nombre des recherches effectuées par de nombreux utilisateurs, ils peuvent être exploités pour améliorer divers aspects du système de recherche. Cette thèse propose un modèle des historiques de recherche, le Modèle de Pertinence Utilisateur, où les jugements de pertinence sont issus d'un processus génératif par lequel l'utilisateur juge (soit implicitement soit explicitement) un document comme pertinent s'il partage un degré de recouvrement avec la requête en termes de concepts, et non pertinent dans le cas contraire
Semantic clustering of images using patterns of relevance feedback
User-supplied data such as browsing logs, click-through data, and relevance feedback judgements are an important source of knowledge during semantic indexing of documents such as images and video. Low-level indexing and abstraction methods are limited in the manner with which semantic data can be dealt. In this paper and in the context of this semantic data, we apply latent semantic analysis on two forms of usersupplied data, real-world and artificially generated relevance feedback judgements in order to examine the validity of using artificially generated interaction data for the study of semantic image clustering
Evolutionary Clustering and Analysis of User Behaviour in Online Forums
In this paper we cluster and analyse temporal user behaviour in online communities. We adapt a simple unsupervised clustering algorithm to an evolutionary setting where we cluster users into prototypical behavioural roles based on features derived from their ego-centric reply-graphs. We then analyse changes in the role membership of the users over time, the change in role composition of forums over time and examine the differences between forums in terms of role composition. We perform this analysis on 200 forums from a popular national bulletin board and 14 enterprise technical support forums
L.C.D.: Ensemble methods for spoken emotion recognition in call-centers. Speech communication 49,
Abstract Machine-based emotional intelligence is a requirement for more natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have not been applied in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods